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Abstract — This is a three part paper. 

Optimality of source-channel separation for communication with 
a fidelity criterion when the channel is compound as defined in 
fll and general as defined in 1 2 1 is proved in Part I. It is assumed 
that random codes are permitted. The word "universal" in the 
title of this paper refers to the fact that the channel model is 
compound. The proof uses a layered black-box or a layered input- 
output view-point. In particular, only the end-to-end description 
of the channel as being capable of communicating a source to 
within a certain distortion level is used when proving separation. 
This implies that the channel model does not play any role for 
separation to hold as long as there is a source model. Further 
implications of the layered black-box view-point are discussed. 

Optimality of source-medium separation for multi-user commu- 
nication with fidelity criteria over a general, compound medium 
in the unicast setting is proved in Part II, thus generalizing Part 
I to the unicast, multi-user setting. 

Part III gets to an understanding of the question, '■^Why is a 
channel which is capable of communicating a source to within 
a certain distortion level, also capable of communicating bits at 
any rate less than the infimum of the rates needed to code the 
source to within the distortion level": this lies at the heart of 
why optimality of separation for communication with a fidelity 
criterion holds. The perspective taken to get to this understanding 
is a randomized covering-packing perspective, and the proof is 
operational. 



I. Part I: point-to-point setting 
A. Introduction to, and contribution of Part I 

Optimality of separation based architectures for communi- 
cation with a fidelity criterion over a discrete memoryless 
channel is proved in [3]. Optimality refers to the fact that if 
communication of some source to within some distortion level 
can be accomplished with some architecture, communication 
of the same source to within the same distortion level can 
also be accomplished with a source-channel separation based 
architecture. This result can be generalized to indecomposable 
channels, that is, finite state Markoff channels for which the 
state information dies down with time, as defined rigorously 
in Q. Part I generalizes this optimality to the case when the 
channel is compound, that is, the channel belongs to a set, as 
defined in |1 1, and the channel transition probability is general, 
as defined in [2|, with the difference that the probability of 
excess distortion criterion which is essentially the same as the 
criterion in |1| is used as the fidelity criterion instead of the 
expected distortion criterion used in |3 |. The use of the word 
"universal" in the title of this paper refers to the fact that the 



result holds for a compound channel. Note that the universality 
is over the channel, not the source. 

A generalization to the compound setting is needed because 
the action of real media like wireless medium or the internet 
cannot be modeled as a known transition probability and one 
way to model these media might be that they belong to a set of 
transition probabilities. The multi-user generalization of Part 
I in the unicast setting is the subject of Part II. 

In order to prove this optimality, encoders and decoders are 
allowed to be random. That is, the encoder is a probability 
distribution on the set of deterministic encoders, the decoder 
has access to the particular realization of the encoder, and 
based on this access, acts as a probability distribution on the 
set of deterministic decoders. Error is calculated by averaging 
over the random code. This is called random coding. Random- 
coding argument in [51 uses random codes. The difference is 
that in random-coding is a proof technique whereas in the 
argument in Part I, random-coding is necessary in the sense 
that separation is not optimal for communication with a fidelity 
criterion over a general, compound channel if random codes 
are not permitted. 

The proof uses a layered black-box or a layered input-output 
view-point, and the proof style is different from that used 
in 131. The question arises: can a proof in the style similar 
to fi\ be used to prove the optimality of separation for 
communication with a fidelity criterion when the channel is 
compound. The answer is yes: Amos Lapidoth showed the first 
author, how to prove the result using techniques similar to |3l 
in a private discussion when the first author was explaining the 
result to Amos Lapidoth. A further question arises: what is the 
need for a different proof technique? The answer is many fold: 
First, the proof technique in |3 | and the further extension due 
to Amos Lapidoth require an indecompos ability assumption 
on the channel whereas the proof technique in Part I works 
for general channels as defined in 12] ■ It is for this reason that 
the probability of excess distortion criterion is used instead of 
the expected distortion criterion. The use of the probability of 
excess distortion criterion instead of the expected distortion 
criterion is similar in spirit as the use of the inf information 
rate in |2| instead of mutual information: in ||2], a formula for 
channel capacity is given in terms of the inf information rate 
which is the liminf in probability (see |2|) of the normalized 
information densities where the information density is 
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as compared to the usual formula for channel capacity which 
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is given in terms of mutual information which is the expected 
value of the normalized information density; similarly, the 
probability of excess distortion criterion is a criterion in terms 
of a slight variant of the limsup in probability (see ||2|) of 

— whereas the expected distortion criterion is a 

1 

criterion in terms of the expectation of —d"{X",Y"). The 
result is similar in spirit too: optimality of separation for 
communication with a fidelity criterion can be proved for a 
general channel and correspondingly, the formula for channel 
capacity in fT] holds for a general channel. Second, the layered 
black-box view-point uses only the end-to-end description of 
the channel as being capable of communicating a source to 
within a certain distortion level to prove separation. This 
implies that the channel model does not play any role for 
separation to hold as long as there is a source model. This 
implication cannot be derived from a proof in the style of 
||3l. This implication that the channel model does not play 
any role is also true in (|6l where the context is not proving 
the optimality of separation but reliable communication of 
bits over an individual channel. Further, the layered black- 
box view -point has architectural implications. These and other 
implications of the layered black-box view-point which do 
not follow from the Shannon-Lapidoth view-point will be 
discussed later in the paper 

Part I is joint work of all three authors. Part II and Part III are 
the joint work of the first author and the second author 

B. Notation and definitions for Part I 

Superscript n will denote a quantity related to block-length 
is n. For example, a;" will be the channel input when the 
block-length is n. The only exception to this rule is for a real 
number: iuj„ is used corresponding to block-length n if a;„ is 
a real number in order not to confuse with the n*'' power 
of u. As the block-length varies, x =< a;" >^ will denote 
the sequence for various block-lengths. 

The channel input space is X and the channel output space is 
O. I and O are finite sets. The channel is a sequence k =< 
fc" where 

: i" (2) 

When the block-length is n, the channel acts as A:" (-I ); 
fc"(o"|t") is the probability that the channel output is o" if the 
channel input is t". This model is the same as the model in 121 
which is the same as the model at the top of Page 100 of (l]. 
The model of a "physical" channel would impose causality 
and nestedness among the various fc", and this would be a 
special case of the above model. A compound channel is a set 
k (z A of channels with input space I and output space O. 
This model of a compound channel is the same as the model 
of a compound channel defined in [1| though the emphasis 
in fT\ is on compound DMCs. This model of a compound 
channel also generalizes arbitrarily varying channels defined 
in im. 



The source input space is X and the source reproduction space 
is y. X and y are finite sets. X is a random variable on X. 
X X" >f is the i.i.d. X source where X" is i.i.d. X 
sequence of length n. d : X y. y [0,oo) is a distortion 
function. The n-letter distortion function is defined additively: 

n 

(3) 

i=l 

Communication of the i.i.d. X source over k £ A requires 
an encoder and decoder. When the block-length is n, a deter- 
ministic encoder is a map e" : A"" — I" and a deterministic 
decoder is a map /" : O" 3^". Encoder and decoder can 
be random; a random encoder-decoder can be defined in two 
ways: 

• Encoder is a probability distribution on the space of 
deterministic encoders and the decoder, based on the 
knowledge of the encoder realization, acts as a transition 
probability from the set of deterministic encoders to the 
set of probability distributions on the set of deterministic 
decoders 

• A joint probability distribution on the space of determin- 
istic encoders and decoders 

A random encoder-decoder can be realized if there is a 
shared continuous valued random variable independent of all 
other random variables in the system at both the encoder 
and the decoder. A precise discussion of a random encoder- 
decoder is omitted. The encoder-decoder sequence is denoted 
by < e",/" >5^. If the input to the encoder e" is X", 
the composition e" o fc" o /" produces an output which 
might depend on the particular k G A. Channel k £ A is 
said to be capable of communicating (end-to-end) the i.i.d. X 
source to within a distortion D if there exist encoder-decoder 

< e", >'^ independent of the particular k £ A and a non- 
negative real sequence < ujn >i°, lim uJn — independent 
of the particular k £ A such that 

Pr (^^d"(X", y") > < ujr, \/keA (4) 

< ijJn causes a uniformity in the rate at which the 
probability of excess distortion tends to as block-length tends 
to cxi over k £ A. 

In a separation-based architecture, the encoder is the compo- 
sition of a source encoder and the channel encoder and the 
decoder is the composition of a channel decoder and a source 
decoder The action of the source encoder-source decoder pair 
which is used depends on the source and not on the channel. 
The action of the channel encoder-channel decoder pair which 
is used depends on the channel and not on the source. In 
this sense, the source encoder-source decoder pair and the 
channel encoder-channel decoder pair act in a way to separate 
the source and the channel. Further, the communication over 
the channel with the help of channel encoder and channel 
decoder is reliable communication; fixing this notion of com- 
munication independent of the channel leads to architectural 
standardization in the sense that irrespective of the channel of 
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communication, reliable communication is carried out over the 
channel as part of the end-to-end communication: this point 
is of importance in the multi-user setting; see Chapter 1 of 
Q. Also see Chapter 1 of [7 | for a more detailed high-level 
discussion of separation, and its importance. Mathematically, 
separation architectures are discussed below: 

Let 

X^>{1,2,...,2L"^J} (5) 

is the message set. When the block-length is n, a rate R 
deterministic source encoder is e" : A"" — > and a rate R 
deterministic source decoder /" : — ?> 3^". (e", /") is the 
block-length n rate R deterministic source-code. The source- 
code is allowed to be random in the sense discussed previously. 

< e^^f^ is the rate R source-code. An example of 
a source code used in information-theory arguments is one 
which generate codewords i.i.d. from a particular source: this 
is an example of a random source-code. The source-code 

< e",/" >^ is said to code the i.i.d. X source to within 
a distortion D if with input to e" o /", the output is y" 
such that 

lim Pr ( y") > L» ) = (6) 

rn-oo yrt J 

The above criterion is the probability of excess distortion 
criterion and is essentially the same as the criterion used in 
Chapter 2, §2 of IT]. The infimum of rates needed to code 
the i.i.d. X source to within the distortion D when the dis- 
tortion function is d under the probability of excess distortion 
criterion is the rate-distortion function R^{D); note that the 
dependence of the rate-distortion function on d is suppressed 
in the notation. The above is the operational definition of the 
rate-distortion function. Rate-distortion function can also be 
defined information-theoretically: 

RxiD) = inf I{X;Y) (7) 

{py\x ■■ J2:,ex.veyPx(^'>PYlx(y\^)<D} 

An expression for R^(D) is R^^{D). This is because R^{D) 
is equal the rate-distortion function R{D) defined in Chapter 2, 
§2 of |1|: this follows from the similarity in distortion criteria 
used to define R^{D) and R{D), and it is proved in yj that 
an expression for R{D) is Rl^{D). 

When the block-length is n, a rate R deterministic channel 
encoder is a map e" : — >■ I" and a rate R deterministic 
channel decoder is a map /" : O" A^^ where A^^ = 
Al^ U {e} is the message reproduction set where 'e' denotes 
error. The encoder and decoder are allowed to be random in 
the sense discussed previously. < e",/" >^ is the rate R 
channel code. The classic argument used in Q to derive the 
achievability of the mutual information expression for channel 
capacity uses a random code. Denote 

ge^^ = {<e>fc"o/^">- \ k^A} (8) 

5 G Qa is a compound channel with input space < A^^ 
and output space < A^^ >!j^. Rate R is said to be reliably 
achievable over fc e if there exists a rate R channel code 



< e", /" and a sequence < (5„ >5", (5„ — > as rt — > cx) 
such that 

sup g"({m"}=|m") < 5„ \/k € A (9) 

< 5n >i° causes a uniformity in the rate at which the maximal 
error probability tends to zero as block-length tends to co and 
plays a role similar to the sequence < a;„ >^ in Q. Since 
random codes are permitted, if rate R is achievable, so is 
any rate less than R. Supremum of all achievable rates is the 
capacity of fc G .4. 

< e"oe", fc°fl^ >i" is the separation-based encoder-decoder. 

The question is: if there exists some architecture to communi- 
cate the i.i.d. X source to within a distortion D over k ^ A, 
does there also exist a separation architecture to communicate 
the i.i.d. X source to within a distortion D over k e yl? Note 
that this is an end-to-end question regarding communication 
of the i.i.d. X source over k £ A and not a question 
about just source-coding or channel-coding. Under certain 
assumptions, this question is answered in the affirmative in 
the next subsection. 

C. Optimality of separation for communication with a fidelity 
criterion over a general, compound channel 

1 ) Theorem and proof: 

Theorem I.l (Optimality of separation). Assume that random 
codes are permitted. Let k £ A be capable of communicating 
the i.i.d. X source to within a distortion level D under an 
additive distortion function d. Then, reliable communication 
can be accomplished over k at rates < R^{D). This reliable 
communication can be accomplished with consumption of 
channel resources which is the same as the channel resource 
consumption in the original architecture which communicates 
the i.i.d. X source to within a distortion D over k £ A. 

Further, if reliable communication can be accomplished over 
k £ A at a certain rate strictly > R^{D), then the i.i.d. X 
source can be communicated to within a distortion D over k € 
A by use of a separation architecture. The channel resource 
consumption in this separation architecture is the same as the 
channel resource consumption in the architecture for reliable 
communication at rate strictly > R^{D) when the distribution 
on the message set is uniform. 

Proof fc G is capable of communicating the i.i.d. X 
source to within a distortion D with the help of some encoder- 
decoder < e", /" >j^. Consider the channel set 

Ca = {< e" o fc" o /" fc e A} (10) 

c =< c" >5"e Cj( is a compound channel with input space 
X and output space y. It will be proved that by use of some 
encoder-decoder < E^\F" >f, reliable communication can 
be accomplished over c G C_4 at rates < R^{D) with 
consumption of same channel resources as the architecture 

< e" o fc" o /" >j^, when used for communicating the i.i.d. 
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X source to within a distortion D. It will follow that by 
use of encoder-decoder < i?" o e",/" o F" >j^, reliable 
communication can be accomplished over fc € ^ at rates 
< (D) with consumption of same channel resources as the 
architecture < e"ofc"o/" when used for communicating 
the i.i.d. X source to within a distortion D. 

This will be done by use of a random-coding argument. Let 
the block-length be n. 

Codebook generation: Generate 2L"^J codewords i.i.d. X. 
This is the codebook /C" which the encoder i?" uses. 

Joint typicality: (cc", y"), a;" G A"", y" G 3^" are said to be e 
jointly typical if 

1) x" is e pa' typical, that is, G T(px, e) 

2) -d"(x",y") 
n 

Decoding: Let y" be received as the output of the channel. If 
3 unique x" e the code book /C" such that (x", y") e jointly 
typical, declare that a;" is transmitted, else declare error. This 
is the decoder i^". 

With this encoding-decoding procedure, it can be proved that 
reliable communication can be accomplished over c G Ca at 
rates < R^{D): the error analysis argument for this is in 
Appendix |A] This proves that reliable communication can be 
accomplished over k E A al rates < R^{D). 

Next, the channel resource consumption required in this archi- 
tecture for reliable communication is considered. The encoder 
generates i.i.d. X codes. Thus, the input to c" — ok" o 
/" is i.i.d. X which is the same as the input X" to the original 
architecture e"ofc"o/". It follows that the input to the channel 
/c" in the architecture for reliable communication is I'* " which 
is the same in distribution as the input /" to k" in the original 
architecture. It follows that the channel resource consumption 
in the architecture for reliable communication is the same as 
the channel resource consumption in the original architecture. 
Note that /" might depend on the particular k E A but this is 
irrelevant to the argument. 

This proves the first part of the theorem. 

The proof of the second part of the theorem is the usual ar- 
gument of source coding followed by channel coding. Briefly, 
the argument is the following: let rate R^{D) + e be reliably 
achievable over k G A. Encode the i.i.d. X source to within 
a distortion D at rate R^{D) + e. Communicate this rate 
R^{D)+e message reliably over k E A. Then, source-decode 
the message reproduction. End-to-end, the i.i.d. X source is 
communicated to within a distortion D over k E A. A detailed 
argument is omitted. Since random codes are permitted, by 
using a symmetrically permuted codebook, the distribution on 
the output of the source encoder can be made uniform. It 
follows by an argument similar to the resource consumption 
argument before that the channel resource consumption in the 
separation architecture is the same as the channel resource 
consumption in the architecture for reliable communication at 
rate R^{D) + e when the distribution on the message set is 
uniform. A full argument is omitted. 



This finishes the proof of the theorem. ■ 

Random codes are essential: an example of the failure of the 
optimality of separation for communication with a fidelity 
criterion when the channel is general but random codes are 
not permitted is in llT2l . The authors conjecture that random 
codes are not needed if A is compact. 

2) Extensions: The source and channel have been assumed to 
evolve on the same time scale in Theorem |I. 1[ This assumption 
has been made only for mathematical convenience and can be 
removed. 

Theorem |LT| can be generalized to continuous valued alphabet: 
discretize the continuous valued space, make it discrete and 
then take a limit as the discretization size — > 0; some details 
are in (i8 |. 



The authors conjecture that Theorem I.l can be generalized to 
"many" stationary ergodic sources. 



The authors conjecture that Theorem I.l can be generalized to 
continuous time source and channel evolution. Generalization 
to continuous time channel evolution: what matters in the proof 
of Theorem |I. l| is the end-to-end description that the i.i.d. X 
source is communicated to within a distortion D and whether 
the channel evolves in continuous time or discrete time is 
immaterial. The usual approach to handle sources evolving in 
continuous time is to assume the source is band limited and use 
sampling theorem to make the source evolve in discrete time. 
This approach does not work because an additive distortion 
function in continuous time (this would be an integral over 
time instead of summation over discrete time as defined in 
does not remain additive after sampling. The arguments 
would need to be carried out either in the continuous time 
domain or by discretizing time and taking a limit as the time 
discretization 0. Ideas from [9 1 might be helpful. 

The authors conjecture that universality in Theorem |LT] can 
be extended to the source; ideas from fTOl and IITTl might be 
helpful. 

D. A note on, and implications of, the layered black-box view- 
point 

The proof of Theorem |I.l| uses a layered black-box view-point: 



• The proof is layered in nature because encoder-decoder < 
E", F" >^ is layered on top of the original architecture 
< e" o fc" o /" >5f 

• The proof is black-box in nature because it uses only 
the end-to-end description of < e" o fc" o /" >f as 
communicating the i.i.d. X source to within a distortion 
D. 

Thus, a relationship is seen between two major constructs 
of information theory, namely channel capacity and the rate- 
distortion function, and how they are related in the black-box 



The first part of Theorem I.l is the converse. This converse 
is proved in |3| when the channel is a DMC. The proof 
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uses definitions and properties involving entropy and mutual 
information. Such proof techniques are generally used for 
proving converse results in information theory. The proof in 
Part I uses a random-coding argument and thus proves a 
converse using an achievability technique. This is interesting 
in its own right. 

The question arises: can a proof similar to [3| be given to 
prove the first part of Theorem |I.1| for a compound DMC. 
The answer is yes and this proof is due to Amos Lapidoth; 
this proof was showed by Amos Lapidoth to the first author in 
a private discussion when the first author was communicating 



Theorem I.l to Amos Lapidoth. This proof is in Appendix 



|B] as written by the first author, and has been included with 
the permission of Amos Lapidoth. In a nut-shell, the proof is: 
Use the expected distortion criterion instead of the probability 
of excess distortion criterion and denote the rate-distortion 
function under the expected distortion criterion by R^{D). 
Define 



C^{k£A)= sup ini I{Q,k) 
Qev{i) '^e-^ 



(11) 



where k is now a DMC and k in the above expression denotes 
the transition probability corresponding to the DMC. Let k 
be capable of communicating the i.i.d. X source to within a 
distortion D under the expected distortion criterion, that is, 
there exist encoder-decoder < e",/" >f independent of the 
particular k G A and < a;„ >j^, lim aj„ = independent 
of the particular k £ A such that with input < X" >^ to 
< e" o fc" o /" >f, the output is < >f such that 



E 



n 



< D + UJnyk £ A 



(12) 



Prove that C^(fc e A) > Rx{D) by use of definition and 
computations involving entropy and mutual information: this 
proof closely parallels the proof in |3|. From capacity of 
k £ Ais C'lk e A). From I3il, -Rf (-D) = Rxi^)- Thus, 
capacity of fc S ^ is > R^iD). This finishes the proof. This 
proof does not use layering on top of the existing architecture, 
nor does it use a black-box perspective. In what follows, this 
proof/view will be called the Shannon-Lapidoth proof/view. 

A further question arises: what is the need of a different proof 
technique if the old proof technique can be used to prove 
the result. The rest of this subsection discusses various points 
which come out of the layered black-box proof and which are 
interesting in their own right, and further discusses the fact 
that they do not come out of the Shannon-Lapidoth proof. 

The layered black-box proof holds for a general channel. The 
Shannon-Lapidoth proof can be generalized to indecompos- 
able channels but some ergodicity assumption is required: an 
example of a non-indecomposable channel for which sep- 
aration for communication with a fidelity criterion fails to 
hold when the expected distortion criterion is used can be 
found on Page 1 of Q, "Consider a binary channel where the 
output codeword is equal to the transmitted codeword with 
probability ^ and independent of the transmitted codeword 
with probability i": if the distortion is hamming distortion. 



and the input to this channel is a rate 1 bit stream, the expected 
distortion is 0.25, i?^(0.25) > 0, but capacity of this channel 
is zero, and this implies failure of optimality of separation. 

The layered black-box perspective implies that for the optimal- 
ity of separation for communication with a fidelity criterion 
to hold, the channel model does not play any role as long 
as there is a source model : the only requirement is that 
< e" o fc" o /" communicates the i.i.d. X source to 
within a distortion D. This does not follow from the Shannon- 
Lapidoth proof: the channel is assumed to be a DMC or 
indecomposable. 

The layered black-box proof is iem/-operational. It is opera- 
tional in the sense that it uses only the operational meaning of 
channel capacity as the maximum rate of reliable communica- 
tion. However, it is only semi-operational because it uses the 
definition of the information-theoretic rate-distortion function 
Rx{D) and the equality of the operational rate-distortion 
function R^iD) and Rx{D) crucially. The Shannon-Lapidoth 
proof is not operational: it uses the information-theoretic chan- 
nel capacity C^{k e A) and the information-theoretic rate- 
distortion function R^j^{D) and their equality with operational 
channel capacity of k £ A and the operational rate-distortion 
function R^{D), respectively, crucially. 

The layered black-box view-point gives insights into layered 
architectures for communication. Consider a system s which 
communicates the i.i.d. X source to within a distortion D 
under the probability of excess distortion criterion. Let X' 
be an i.i.d. source such that R^,{D') < R^{D). Then, the 
i.i.d. X' source can be communicated over the system s to 
within a distortion D' by layering, and thus, the system s 
does not need to be "broken". The rough proof is: code the 
source X' to within a distortion D' . The output is a rate 
Rxi{D') bit stream. This bit stream can be communicated 



over s by layering on top by the first part of Theorem 1. 1 and 



its proof since R^,{D') < R^iD). Finally, decode the source. 
This layered architecture does not come out of the Shannon- 
Lapidoth view-point: the only conclusion that can be drawn 
is that the i.i.d. X' source can be communicated to within a 
distortion D' under the expected distortion criterion over an 
indecomposable sub-system of s. 

A further application of the layered black-box view-point is 
in Part III: Part III answers the question, "Why is a channel 
which is capable of communicating a source to within a certain 
distortion level, capable of communicating bits at any rate less 
than the infimum of the rates needed to code the source to 
within the distortion level," by use of a layered black-box, 
randomized covering-packing perspective. This question lies 
at the heart of why separation holds for communication with 
a fidelity criterion. 



E. Recapitulation, speculation and further development for 
Part I 

Optimality of source-channel separation for communication 
with a fidelity criterion over a general, compound channel 
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was proved. It was assumed that random codes are permitted. 
The probability of excess distortion criterion was used as the 
fidelity criterion. The distortion function was additive. 

The proof used a layered black-box view-point. A proof due 
to Amos Lapidoth, based on the original separation argument 
of f3l is provided for a compound DMC. Various implications 
of the layered black-box proof technique are discussed. It is 
discussed, how these implications do not come out of the 
Shannon-Lapidoth proof technique. 



Theorem I.l is generalized to the unicast multi-user set- 
up in Part II. The Shannon-Lapidoth proof technique is not 
operational and the layered black-box proof technique is semi- 
operational. A fully operational proof is provided in Part III. 
This proof in Part III also gets to the heart of the question, 
"Why is a channel which is capable of communicating a source 
to within a certain distortion level, capable of communicating 
bits at any rate less than the infimum of the rates needed to 
code the source to within the distortion level": this lies at the 
heart of optimality of separation for communication with a 
fidehty criterion. 

II. Part II: multi-user setting 

A. Introduction to, and contribution of Part II 

In Part II, the optimality of source-medium separation for 
multi-user communication with fidelity criteria over a general, 
compound medium in the unicast set-up is proved. This gener- 
alizes the result of Part I. Unicast set-up means that the sources 
which various users want to communicate to each other are 
independent of each other This will be a simple generalization 
of Part I. As in Part I, random codes are permitted. As in 
Part I, universality of the result refers to the fact that the 
medium model is compound. Further, as is the case in Part 
I, universality is over the medium, not the source. 

Separation in Part II refers to source-medium separation as 
opposed to the separation of channel-coding and network- 
coding found in the network-coding literature lfT3l . 

In lfT4l . the optimality of separation for communication with a 
fidelity criterion over a finite memory medium is proved. The 
result in Part II is a generalization of 1 14|: it holds for a general 
medium and compound medium with the difference that lfT4ll 
uses the expected distortion criterion whereas Part II uses 
the probability of excess distortion criterion. The theorem in 
Part II was proved simultaneously and independently of IT?! . 
A generalization to the compound setting is needed because 
the action of real media like wireless medium or the internet 
cannot be modeled as a known transition probability and one 
way to model these media might be that they belong to a set 
of transition probabilities. 

lITSl provides examples where source-medium separation is 
not optimal for communication with fidelity criteria when the 
sources are correlated. Thus, the unicast setting is the extent 
to which the optimality of separation for communication with 
fidelity criteria can be expected to hold in a general framework. 



B. Notation and definitions for Part II 

Notation and definitions from Subsection II-BI of Part I will be 
used. Recall, in particular, the superscript notation. 

There are N users: users i, 1 < i < N. The medium input 
space at user i is li and the medium output space at user i 
is O^. Xi,l < i < N SLnd 0,1 < i < N ai-e finite sets. The 
medium is a sequence m m" where 

1=1 \i=l I 

■ {CI, \<i<K)^ 1 < i < iV)) (13) 

When the block-length is n, the medium acts as m"(-|-); 
m"(of,l < i < iV|i^,l < i < TV) is the probability that 
the medium output at user i is o", 1 < i < if the medium 
input at user j is t", 1 < i < A^. A compound medium is 
a set B of media with input space li at user i, 1 < i < N, 
and output space Oi at user i, 1 < i < N, and is denoted by 
m e B. 

The source input spaces are Xij,l < i,j < N and source 
output spaces are J^^, 1 < i, j < N, respectively. Xij,yij are 
finite sets. Xij =< X^j is the i.i.d. Xij source: X^j is the 
i.i.d. Xij sequence of length n. The i.i.d. Xij source needs 
to be communicated from user i to user j. dij : Xij x y^j -^• 
[0, 00) is the distortion function for communication from user i 
to user j. The n-letter distortion function is defined additively: 



t=l 



(14) 



Communication of the sources Xij over m ^ B requires 
modems hi =< h" >^ at user i. The modems are allowed to 
collectively generate random codes; such random codes can 
be realized if there is a shared continuous valued random 
variable independent of all other random variables in the 
system available at all the modems. The exact model of 
modems is irrelevant; it suffices to say that the interconnection 
of medium and modems can be thought of as a different 
compound medium w —< >f£ Mb: 



w 




(15) 



The inputs at user i to w £ Mb are Xij ~< XI] >f', 
1 < i < With these inputs, w € Mb produces outputs 
Yji =< Yji >f', I < j < N at user i. Note the order of i 
and j in the notation: the reproduction of source Xij which 
is input at user i and destined for user j is Yij at user j. 
Medium to e S is said to be capable of communicating i.i.d. 
Xij source from user i to user j, I < i,j < N to within a 
distortion Dij if there exist modems < hf >f independent 
of the particular m £ B and non-negative real sequences < 



>5", lim ujij^n = such that 



Pr 



1 



d^.(.X:.,Y:;) > < UJ^,^J Vz, J, Vm e S (16) 



7 



In a separation architecture, each modem < hf >f^ consists 

> 



of a source modem < 



^ and a medium modem < 



The type of < /i" ^ >!j" needed in Part II acts independently on 
sources Xij ,1 < j < N at user i and message reproductions 
at user i: < ^ >^ encodes sources Xij,l < j < N to 
within distortion levels Dij and source decodes the message 
reproductions into source reproductions Yji,l < j < N. 
Thus, < >f can be thought of as n independent source 
encoders and n independent source decoders. The infimum of 
rates needed to code the source Xij to within distortion Dij 
when the distortion function is dij under the probability of 
excess distortion criterion 



lim Pr ( -d^j{X^: 



= 



(17) 



is denoted by R^. {D, 



The interconnection of medium modems and the medium 
is used for reliable communication: a precise discussion is 
omitted and will be clear when discussing the theorem and its 
proof. 



The question is: if there exists some architecture to commu- 
nicate the i.i.d. Xij sources to within distortions Dij over 
m G B, does there exist a separation architecture, too? 
Under certain assumptions, this question is answered in the 
affirmative in the next section. 



C. Optimality of separation for multi-user communication 
with fidelity criteria over a general, compound medium in the 
unicast setting 



This section generalizes Theorem 1. 1 of Part I. 



Theorem II.l (Optimality of separation: unicast, multi-user 
setting). Assume that random codes are permitted. Let m Cz B 
be capable of universally communicating i.i.d. Xij source from 
user i to user j, 1 < i, j < N to within distortion levels Dij 
under additive distortion functions dij. The sources Xij are 
independent of each other. Then, reliable communication can 
be accomplished from user i to user j, 1 < i, j < N over m G 
B at rates Rij < . (Dij). This reliable communication can 
be accomplished by consumption of same or lesser medium 
resources at each user as the medium resource consumption 
in the original architecture for communication of i.i.d. Xij 
sources to within distortion levels Dij. 

Further, if reliable communication can be accomplished over 
m € B from user i to user j, 1 < < N at certain 
rates strictly larger than R^, (Dij), then the independent, 
i.i.d. Xij sources can be communicated from user i to user j, 
1 ^ hj ^ N to within distortion levels Dij over m B 
by use of a separation architecture. The medium resource 
consumption in this separation architecture at each user is 
the same as or lesser than the medium resource consumption 
in the architecture for reliable communication at rate strictly 
> Rx XDij) from user i to user j when all messages for 



transmission between all pairs of users are independent 

of each other and the distribution on every message is uniform. 

Proof i.i.d Xij sources can be communicated from user 
i to user j, 1 < j,i < iV to within distortion Dij with the 
help of modems < /i" >^ at user i,l < i < N . Consider two 
particular users, user s and user r and the communication from 
user s to user r, neglecting all other users. The communication 
of i.i.d. Xsr source from user s to user r to within distortion 
Dsr under additive distortion function dsr can be thought 
of as point-to-point communication. By Theorem |I.1| reliable 
communication can be accomplished over m E B from user 
s to user r at rates < R^ {Dsr) by user of an encoder- 
decoder < E^^,F^^ >^ which layers on top of w G Mb- 
-E"^ generates i.i.d. X^j. codes. Thus, the distribution of the 
inputs to w <E M.B in the new architecture is the same in 
distribution as the old architecture. In particular, this implies 
that the communication of sources Xij, 7^ {s,t) is 

unaffected in distribution in the new architecture, and thus, 
further in particular, for ^ {s,r), Xij is communicated 
to within distortion Dij in the new architecture. The process 
can be repeated inductively for all pairs of users (i, j) and thus, 
reliable communication can be accomplished from user i to 
user j, 1 < i, j < N over m G S at rates Rij < R^, ,{Dij). 
The argument concerning medium resource consumption in 
the architecture for reliable communication is the same as the 
argument in the proof of Theorem |I. 1 [ and is omitted. The proof 
of the second part of the theorem concerning existence of a 
separation architecture and the resource consumption in the 
separation architecture is the same as in the proof of Theorem 
11.11 and is omitted. ■ 



D. Discussion: partial applicability to the traditional wireless 
telephony problem 

The traditional wireless telephony problem is the following: 
there are 2N users Si^ \ < i < N and s^, 1 < i < A^. User Si 
wishes to talk to user s[, 1 < i < N . The voice signal at user 
Si is Vi and the voice signal at user s[ is ¥(. The question is: 
how should architectures be designed to maximize the number 
of pairs of users which can communicate at the same time 
under certain constraints on resource consumption. 

Note 

• Voice signals are pairwise independent: Vi is independent 
of Vj, Vj for j ^ i. However, Vi and VI are dependent 

• Wireless medium is time varying and only partially 
known 

• Voice admits distortion 

• Other concerns: For example, security and delay 
Assume 

• Voice signals Vi,Vj,l < i,j < N are independent, 
not just pairwise independent. This assumption is clearly 
incorrect but is made 

• Wireless medium can be modeled as discussed in Part II 
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• Distortion function for measuring the quality of voice 
transmission is additive. This assumption is clearly in- 
correct but is made here 

• Voice can be modeled as a stationary ergodic process for 
which the result of Part II can be generalized 

• Other concerns, for example, delay and security are 
neglected in this discussion 

Then, it follows that assuming random codes are permitted, 
separation architectures are optimal for the traditional wireless 
telephony problem. 

As stated before, the assumption that voice signals Vi,Vj ,1 < 
hi < N are independent does not hold. A simple problem 
to understand the question when signals are dependent is the 
following; there are two users, user s and user s' . User s 
wishes to communicate source V to user s' under an additive 
distortion function d and user s' wishes to communicate source 
V' to user s under an additive distortion function d! . Sources 
V and V' might be dependent. Then, does separation hold, and 
if not, does separation hold to some extent? As stated before, 
lITS) gives examples where optimality of separation does not 
hold if the sources are correlated. In the above example, the 
authors do not expect the optimality of separation to hold 
in general. A question to understand is whether approximate 
optimality of separation in the sense, for example, of lfT4l . 
holds, in this example. 

E. Recapitulation for Part II 

Optimality of source-medium separation for multi-user com- 
munication with fidelity criteria over a general, compound 
medium in the unicast setting was proved, thus generalizing 
Part I. It was assumed that random codes are permitted. 
The probability of excess distortion criterion was used as the 
fidelity criterion. The proof is a simple generalization of Part 
I. Partial applicability to the traditional wireless telephony 
problem was discussed. 

A question to investigate is whether random codes are needed 
or not if B is compact. 

The proof technique of interconnecting sub-systems by main- 
taining marginals is reminiscent of the behavioral, intercon- 
nection view, lfT6l . ifTTl . although in a stochastic setting. 

III. Part III: Why does separation hold? 

A. Introduction to, and contribution of Part III 

Optimality of separation for communication with a fidelity 
criterion was proved by Shannon in [3| and generalized to the 
compound setting in Part I. The non-trivial step in the proofs 
in 131 and Part I is to prove the statement, 

"A channel which is capable of communicating the i.i.d. X 
source to within a certain distortion level is also capable of 
communicating bits reliably at any rate less than the infimum 
of the rates needed to code the i.i.d. X source to within the 
same distortion level." 



Neither the proof in O, nor the proof in Part I get to the heart 
of why this statement is true. They provide proofs but not a 
deep understanding: the reason for this is the crucial reliance 
of the proofs on mutual-information expressions for rate- 
distortion function or the capacity of a channel, as discussed in 
Part I. Part III gets to the heart of why this statement is true. 
The channel-coding problem for obtaining rates of reliable 
communication can be thought of as a packing problem and 
the source-coding problem of obtaining rates needed for source 
compression can be thought of as a covering problem. The 
perspective taken in Part III is that of a randomized covering- 
packing perspective. In contrast to the discussion in Part 
I where the nature of the proof was semi-operational, the 
perspective in Part III is fully operational. 

The view in Part III is not to get to general conditions under 
which separation holds and under which separation does not 
hold: this view of getting to general conditions is taken, for 
example, in |181. The view in Part III is to make the relevant 
assumptions needed in order to get to a conceptual, intuitive 
understanding of why separation holds. In order to get to this 
understanding, the uniform X source will be used instead 
of the i.i.d. X source in the arguments. Uniform X source 
puts uniform distribution on sequences with type precisely px, 
compared to the i.i.d. X source which puts "most" of the mass 
on sequences with type "close to" px- The use of the uniform 
X source, because of a single type class, avoids e-S arguments, 
and helps get to the essence of the optimality of separation for 
communication with a fidelity criterion. 

The setting in Part III is the same as the setting in Part I with 
the following differences: 

1) The uniform X source is used instead of the i.i.d. X 
source 

2) The distortion function is assumed to be permutation 
invariant instead of additive. Permutation invariant dis- 
tortion functions are defined in Section |III-B[ additive 
distortion functions are a special case of permutation 
invariant distortion functions 

3) A technical condition, stated in Theorem |III.1[ is as- 
sumed on the rate-distortion function 



B. Notation and definitions for Part III 

Notation and definitions from Subsection II-BI of Part I will be 
used. Recall, in particular, the superscript notation. 

The source input space is X and the source reproduction space 
is y. X and y are finite sets. X is a random variable on X. Let 
Px{x) be rational Vx. Let denote the least positive integer 
for which nQpx{x) is an integer Va; e X. Let denote the 
set of sequences with {exact) type px- is non-empty if and 
only if no divides n. Let n' = non. Let t/" denote a random 
variable which is uniform on and zero elsewhere. Then, 
< J7" >^ is the uniform X source and is denoted by U. The 
uniform X source can be defined only for those X for which 
Px{x) is rational Va; £ X. 
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Let q denote a type on the set y which is achievable when the 
block-length is n' . is the set of all sequences with type q. 
The uniform distribution on is . 

Since the uniform X source is defined only for block- lengths 
n' , distortion function, channels, encoders and decoders will 
be defined only for block-lengths n' . 

d =< <P is the distortion function where d" : A"" x 
3^" [0,00). Let tt" be a permutation (rearrangement) of 
(1,2, . . . That is, for 1 < i < n' , 7r"'(i) e {1, 2, . . . , n'j- 
and that, tt" [i), 1 < i < n' are different. For e A"" , 
denote 

^"x"' ^ (x"'(7r"'(l)),x"'K' (2)),..., («'))) 

(18) 

For y"^ e 3^" , tt" y" is defined analogously. < d" >^ is 
said to be permutation invariant if Vn', 

d"'(7r" x"',7r"'2/"') = d"'(2;"',y"'),V2:"' G A-^.y"' G J^"' 

(19) 

The distortion function in Part III is assumed to be permutation 
invariant. An additive distortion function is an example of a 
permutation invariant distortion function. 

The channel is a sequence k —< fc" >^ where fc" is defined 
as in Part I. The compound channel is a set T) of channels 
and is denoted hy k E V. Encoder-decoder < e" , /" >f^ 
is defined as in Part I with the difference that definitions are 
made only for block-lengths n'. k E T> is said to communicate 
the uniform X source to within a distortion Z) if Q is replaced 
with 



Pr 



—d"' ([/"', i^"') > )< Vfc e 2? (20) 
n' ' 



for some < a;„/ utn' — ^ as n' — )■ 00. 

Source-code < e" , /" >f is defined as in Part I with the 
difference that definitions are made only for block-lengths n'. 
< e" , /" >^ is said to code the uniform X source to within 
a distortion D if (j6]) is replaced with 

lim Pr I ^d'''(U''\Y''') > d] =0 (21) 

n'-s-oo yn' J 

The infimum of rates needed to code the uniform X source to 
within a distortion D is R^{D). A definition not made in Part 
I is that of the rate-distortion function under the inf probability 
of excess distortion criterion. This definition was not needed 
in Part I but is needed in Part III. < e" , /" >f is said to 
code the uniform X source to within a distortion D under the 



inf probability of excess distortion criterion if (21 1 is replaced 
with 



lim inf Pr 



— d"' ([/"', r"') > D 

n' 







(22) 



The infimum of rates needed to code the uniform X source 
to within a distortion D under the inf probability of excess 
distortion criterion is denoted by inf). 

Channel-code < e" , /" >^ is defined as in Part I with the 
difference that definitions are made only for block-lengths n'. 



Reliable achievability of rate R over k E V is defined as in 
Part I with the difference that only block-lengths n' matter. 
Capacity of fc G 2? is the supremum of all rehably achievable 
rates. 

As in Part I, < e"' o e"', /"' o /"' >f is the separation-based 
encoder-decoder. 

As in Part I, the question is: if there exists some architecture to 
communicate the uniform X source to within a distortion D 
over k E T), does there exist a separation architecture too? 
This question is answered in the affirmative under certain 
assumptions in the next subsection; note, in particular, the 
technical assumption on R^{D) in the statement of Theorem 



III. 1 The important point for Part III is that the proof gets to 



the essence of why separation holds for communication with 
a fidelity criterion. 

C. Optimality of separation for communication with a fidelity 
criterion over a general, compound channel 

Theorem III.l (Optimality of separation). Assume that ran- 
dom codes are permitted. Assume that R^{D,ini) — R^{D). 
Let k E V be capable of communicating the uniform X 
source to within a distortion level D under a permutation 
invariant distortion function d. Then, reliable communication 
can be accomplished over k at rates < R^{D). This reliable 
communication can be accomplished with consumption of 
channel resources same as the channel resource consumption 
in the original architecture which communicates the uniform 
X source to within a distortion D over V. 

Further, if reliable communication can be accomplished over 
k eT> at a certain rate strictly > R^{D), then the uniform X 
source can be communicated to within a distortion D over k E 
A by use of a separation architecture. The channel resource 
consumption in this separation architecture is the same as the 
channel resource consumption in the architecture for reliable 
communication at rate strictly > R^{D) when the distribution 
on the message set is uniform. 

Proof: k E T> is capable of communicating the uniform X 
source to within a distortion D with the help of some encoder- 
decoder < e" , /" >f. Consider the channel set 



Cv=< 



fc" o/" >'^,kEV 



(23) 



c c" >^ is a compound channel with input space X 
and output space y. It will be proved that by use of some 
encoder-decoder < i?" , i^" >f^, reliable communication can 
be accomplished over c € at rates < R^{D, int) with 
consumption of same channel resources as the architecture < 
e" ofc" o/" >f, when used for communicating the uniform 
X source to within a distortion D. By use of the assumption 
R^{D) = R^{D, inf), it will follow that by use of encoder- 
decoder < i?" o e",/" o >f, reliable communication 
can be accomplished over k E T> at rates < R^{D) with 
consumption of same channel resources as the architecture 
< e" o A:" o /" >f', when used for communicating the 
uniform X source to within a distortion D. 



10 



This will be done by use of parallel random-coding arguments 
for two problems: 

• Channel-coding problem: Rates of reliable communica- 
tion over k ^Cx>- 

• Source-coding problem: Rates of coding for the uniform 
X source to within a distortion D under the inf proba- 
bility of excess distortion criterion. 

Codebook generation: 

• Codebook generation for the channel-coding problem: 
Let reliable communication be desired at rate R. Generate 
2 L" sequences independently and uniformly from . 
This is the codebook /C" . 

• Codebook generation for the source-coding problem: Let 
source-coding be desired at rate R. Generate 2L" 
codewords independently and uniformly from for 
some type q on y which is achievable for block-length 
n' . This is the codebook £" . 

Joint typicality: 

Joint typicaUty for both the channel-coding and source-coding 
problems is defined as follows: (u" , y" ) G x J^" jointly 
typical if 

<Z? (24) 

n' 

Decoding and encoding: 

• Decoding for the channel-coding problem: Let y" be 
received. If there exists unique u" G /C" for which 
(u" , y" ) jointly typical, declare that is transmitted, 
else declare error. 

• Encoding for the source-coding problem: Let vP G 
need to be source-coded. If there exists some y" G £" , 
encode u" to one such y" , else declare error. 

Some notation: 



Notation for the channel-coding problem: Let mes- 
sage m" G A^^ be transmitted. Codeword corre- 
sponding to m" is u" . Non-transmitted codewords are 
li'" , u'2 , • • ■ , u'2Li' «j ""c is ^ realization of ?7" . C/" 
is uniform on . m'," is a realization of [/'" . [/'" is 
uniform on W'' , l<i< 2L"'-RJ - 1. U^\uf ,1 < i < 
2L" — 1 are independent of each other. The channel 
output is y" . y" is a realization of Y" . y" may depend 
on u" but does not depend on u^" , 1 < i < 2L" — 1. 
As random variables, F" and f/" might be dependent 
but r"',C/f'',l < i < 2L"'^J - 1 are independent. If 
the type q of the sequence y" needs to be explicitly 
denoted, the sequence is denoted by y^ . Q"' is the set 
of all achievable types q on y for block-length n'. 
may depend on the channel c G C^; this dependence is 
suppressed. 

Notation for the source-coding problem: u" is the se- 
quence which needs to be source-coded, m" is a real- 
ization of [/" which is uniformly distributed on U"' . 



The codewords are y^ j , 1 < « < 2 where q denotes 
the type. y^\ is a realization of VJ\,1 < i < 2L" 
where l/J", is uniformly distributed on the subset of 3^" 
consisting of all sequences with type g. m" , y^\, 1 < « < 
2 L" 1^1 are independently generated; as random variables, 
U^' ,Y<, l<i< 2L"'«J are independent. G"' is the set 
of all achievable types q on y for block-length n' 

Error analysis: For the channel-coding problem, the probabil- 
ity of correct decoding is analyzed and for the source-coding 
problem, the probability of error is analyzed. 

• Error analysis for the channel-coding problem: From 
the encoding-decoding rule, it follows that the event 
of correct decoding given that a particular message is 
transmitted is 

1 



n 



(25) 



Error analysis for the source-coding problem: From the 
encoding-decoding rule, it follows that the error event 
given that a particular message needs to be source-coded 
is 



(26) 



Note that there is choice of q for codebook generation. 
Calculation: 

• Calculation of the probability of correct decoding for the 
channel-coding problem: 

A bound on the probability of the correct decoding event 



(25 I is calculated, using essentially standard arguments 



for calculating such bounds, in Appendix |C] and is 

> - u„,+ 



inf^Pr -d"(C/",y,") 



> D 



jL"'-RJ. 



(27) 



where [/" is uniform on U" 
Rate R is achievable if 



inf^Pr -d" (C/",y," 



> D 



1 as n' — > 00 
(28) 
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ujn' — > as 7l' 

if 



inf Pr 



oo. It follows that rate R is achievable 

2L"'-RJ_i 



-r{U\y^)>D 



— T' 1 as n' — > oo 
(29) 



Calculation of probability of error for the source-coding 
problem: 



An expression for the probability of the error event (26 1 is 



calculated, using standard arguments for calculating these 
probabilities, in Appendix |C] and is 

2L"'HJ 



inf Pr 



-dT {u- ,V^)>D 



(30) 



where V^" is uniform on V^' . This expression is calcu- 
lated using essentially standar The infimum in the above 
expression reflects the existing choice of type in the 
codeword generation process. 

Since the inf probability of excess distortion criterion is 
used, it follows that rate R is achievable if 



inf Pr 

-> for some n[ = nprii, — > oo (31) 
Connection between channel-coding and source-coding: 
The calculation required in the channel-coding problem is 
1 



inf Pr 



-d"(c/",y;)>^ 



(32) 



and the calculation required in the source-coding problem is 



inf Pr 



n' ^ 



It will be proved that ( 32 1 and ( 33 i are equal. It will be proved 
more generally that 



Pr( ^ Id"' ([/"', y«')>i5 



Pr 



(34) 



This is a symmetry argument and requires the assumption of 
permutation invariant distortion function. The idea is that the 
left hand side of (34 1 depends only on the type of . From 
this it follows thatme left hand side of (34i is equal to 



Pr 



yd" ([/" , V^' ) > D 



(35) 



where V^^ is independent of [/" . Similarly, the right hand 
side of ( 34 1 depends only on the type of u" and from this it 



follows that the right hand side of (|34| is also equal to ( 35 1 



(34 1 follows. Details are in Appendix O 



Proof that a channel which is capable of communicating the 
uniform X source to within a certain distortion level is also 



capable of communicating bits reliably at any rate less than 
the infimum of the rates needed to code the uniform X source 
to within the same distortion level under the inf probability 
of excess distortion criterion: 



Denote 



inf Pr 

gee'*' 



-d"([/",y,")>i? 



inf Pr 



(36) 



From ( p9) l, it follows that rate R is achievable for the channel- 
coding problem if 



1 as n' — >■ oo 



(37) 



From ( (3T| , it follows that rate R is achievable for the source- 
coding problem if 



[An',)'^^"""'^ ^> as n'j OO 

for some n[ — noUi for some Ui — oo 



Let 



a = sup{i? I (37) holds} 



(38) 



(39) 



Then, if R' > a. 



lim [A^'f"'"''- 



n[ may depend on R' . 
Then, 



'1 < 1 V i?' > a 
for some sequence n[ 



lim (yl„' 



for R" > R! 



(40) 



(41) 



(40 1 and (41 1 hold for all R" > R' > a. It follows that rates 



(33) larger than a are achievable for the source-coding problem. 



Thus, a channel which is capable of communicating the 
uniform X source to within a certain distortion level is also 
capable of communicating bits reliably at any rate less than 
the infimum of the rates needed to code the uniform X source 
to within the same distortion level under the inf probability 
of excess distortion criterion. 

Wrapping up the proof of the theorem: 

It follows that if fc e I? is capable of communicating the 
uniform X source to within a distortion level D under a 
permutation invariant distortion function d, then reliable com- 
munication can be accomplished over k at rates < R^{D, inf). 
By use of the assumption R^{D) = i?^(D,inf), it follows 
that reliable communication can be accomplished over k at 
rates < R^{D). The argument concerning channel resource 
consumption is the same as in the proof of Theorem |I.1| 
The argument for the second part of the theorem concerning 
existence of a separation architecture and channel resource 
consumption in the separation architecture is the same as in 
the proof of Theorem |I. 1[ 
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D. Discussion 
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The proof of Theorem III.l can be viewed as exhibiting a 
randomized covering-packing perspective on the optimality 
of separation for communication with a fidelity criterion: 
the source-coding problem can be thought of as a covering 
problem and the channel-coding problem can be thought of as 
a packing problem; the proof uses a random-coding argument 
for each of these problems and draws a parallel between 
them. The essence of why separation holds is captured in 
( [34) 1. The reader should compare this proof with the Shannon- 
Lapidoth proof described in Part I. The proof can also be 
viewed as a connection (duality) between source-coding and 
channel-coding. The proof only uses the operational meanings 
of reliable communication and source coding and not explicit 
functional simplifications, for example, mutual-information 
expressions, for channel capacity or the rate-distortion func- 
tion; the extent to which functional simplifications are required 
are in the calculations needed to arrive at ( (32] i and ( (33] l. 
A precise definition of an operational proof is not possible; 
however, it can be intuitively interpreted from the context in 
which the word is used. 

The technical condition R^{D) = R^{D, inf) is made on the 
rate-distortion function. It is unlikely that this technical condi- 
tion will hold for an arbitrary permutation-invariant distortion 
function d =< d" >f^; however, the authors conjecture that 
this technical condition will hold for "many well behaved" 
permutation invariant distortion functions. The validity of this 
technical condition is proved operationally for an additive 
distortion function in Chapter 5 of fT9l; recall that an additive 
distortion function is permutation invariant. 

The proof technique of Part III is generalized to the i.i.d. X 
source for an additive distortion function in Chapter 5 of |fT9l . 
The authors conjecture that the proof technique of Part III 
can be used for the i.i.d. X source for many "well behaved" 
permutation invariant distortion functions. 

An alternate proof of the rate-distortion theorem which the 
authors believe is more fundamental than the original proof of 
Shannon is presented in Chapter 5 of Iil9il . 
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E. Recapitulation for Part III 

A perspective which gets to the heart of why separation holds 
for the problem of communication with a fidelity criterion 
is provided. The perspective is an operational, randomized 
covering-packing perspective. 



IV. Recapitulation 

The abstract to this paper and the recapitulations to each of 
Part I, Part II, and Part III, together recapitulate this paper. The 
development in this paper is brief; an elaborate development 
of a large part of this paper and further discussions can be 
found in ifl^ . 



Appendix A 
Error analysis in the proof of TheoremITTTI 

Denote: < 5" >f=< E'' o c" o F" >f^. 
Denote: 

• x" G JC^ is the transmitted codeword corresponding to a 
a particular message to" 

• e 3^" is the received sequence 

• z" E /C" is a codeword which is not transmitted 

^'^dm"}"^!™") needs to be calculated. 



5"({m"}^|TO") C Pr(£T U < Pt{£^) + Pr{£^) (42) 
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where 

• is not e jointly typical 

• 82'- 3z" such that (z",j/") is e jointly typical 

By definition of c G C^, Pr(£") — > as 71 — > 00 at a uniform 
rate over c G C^, and independently of m". 

Calculation of Pr(£^) requires a method of types calculation, 
see m or ll20l . similar in arguments to those used in the proof 
of Lemma 1 in |6| which, in turn, are similar to the error 
analysis in the conference version [SJ of Part I. This calculation 
is carried out below: 

Fix y". Let ?/" have type qy- Let z" be a particular non- 
transmitted codeword. The distribution Z" on z" is generated 
i.i.d. X. Z" and are independent of each other Let qz\Y 
be a conditional distribution y — > V{X). qzv is the joint 
distribution resulting from qy and qz\Y Then, 

Pr {Pzr.\yr. = qZ\Y\Py^ = Qy) 

yey 

= 2^"^He3' IV {y)D{qz\Y{-\v\\px) 

— 2^"-^('?^'*'IIP^''"^) (43) 

It follows by 

• noting that qy is arbitrary, 

• number of joint types on A" x is < (n + l)!-^!!^!, 

• and by use of the union bound. 



that 



Pr(£^) < ijl ^ \)\'^\\^\2'^'^^^2^"'™^''ZY<^T D(qzY\\pxqY) 



where 



T = { qzY 



qz epx ±e 



(44) 



(45) 



The above bound on Pr(f I*) is independent of m" and c G Ca- 
As stated before, Pr(£") — > as n — > 00 at a uniform rate 
over c G Ca, and independently of m". 

Thus, rates R < iniqzYeT D{qzY\\PxQY) are reliably achiev- 
able over c G Ca- Now, 

D{qzY\\pxqY) = 

D{qz\\px) + D{qzy\\qzqY) > D{qzy\\qzqY) (46) 
Thus, rates 

R< mi D{qzy\\qzqY)^ M I{Z;Y) (47) 

qzY^T qzYET 

are reliably achievable over c E Ca- Now, 

inf I(Z:Y)= inf Ri(D) (48) 



Thus, rates 



R < inf RUD) 



(49) 



are reliably achievable over c G Ca- e > is arbitrary and 
Rxi') is continuous. Thus, rates R < R^^{D) are reliably 
achievable over c G Ca- Rx{D) = Rx{D). Thus, rates < 
R^{D) are reliably achievable over c G C^. 



Appendix B 

Shannon-Lapidoth view of the optimality of 
separation for communication with a fidelity 

criterion over A COMPOUND DMC 

Let k =< fc" >^G ^ be a compound DMC which is capable 
of communicating the i.i.d. X source under the expected 
distortion criterion. As stated in Subsection |I-D| it will be 
proved that C'{k e A) > Rx{D) from which it will follow 
that the capacity of fc G is > R^{D). This will prove 
the equivalent of the first part of Theorem |LT| for a compound 
DMC. An argument will be made concerning channel resource 
consumption. The equivalent of second part follows by source- 
coding followed by channel-coding, as in the proof of Theorem 

n] 

To prove C^(fc G A) > Rx{D), it will be proved that Vn, 
C^(fc G A) > Rx{D + (^n)- By the continuity of it 
will foUow that C^{k e A) > Rx{D)- To prove {k G 
A) > RxiD), it will be proved that 



and 



nRl(D + uJn) < inf /(X";^" 

keA 



inf < nC^k G A) 

keA 



(50) 



(51) 



In (50 1 and (51 1, may depend on the particular k £ A; this 
dependence is suppressed in the notation. It is this dependence 
on fc G ^ that the infimum of the mutual informations is taken 
over. 



If ^ is a singleton, (50 1 is proved in f3l. If is not a singleton. 



( 50 1 then follows by taking an infimum over all k £ A- 



To prove (51 1, with input X" to the encoder e", denote the 
input to k" by /" and output of k" by O". /" is independent 
of k E A and (9" might depend on fc G .4. Fix a particular 
fc G ^. By the data processing inequality. 



Define, for t G I, 



;Pr(/"(t) = 



(52) 



(53) 



Since e" is independent of the particular k £ A, Pr(/"(i) = l) 
is independent of the particular k £ A. Thus, r( ) is indepen- 
dent of the particular k £ A- 

Then, 

/(/";0") 
=77(0") -i7(0"|/") 

n 

<^iJ(0"(t)) -i/(0"|J") 

t=l 

(by using H{P, Q) = H(P) + H{Q\P) and 

H{P\Q,R)<H{P\Q)) 

n 
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E 

t=i 



F(0"(i)|/", 0"(1), 0"(2), . . . , 0"(t - 1)) 



^i7(0"(t))~5]i/(0"(i)|/"(i)) 



t=i 



(since A: is a DMC) 



/(/"(0;O"W) 



<n/(T, fc) 

(by the convexity I{-,k)) 

Thus, 



inf l(r;0'') < inf nI(T,k) 



(54) 



(55) 



Now, 



Thus, 



inf nI{T, k) < sup inf nI{Q, k) (56) 



keA 



Qe-P(i) 



k£A 



inf /(/"; O") < sup inf nI{Q, k) = nC' {k e y^) (57) 



Qev{x) 



keA 



This proves ( [ST) . 

By previously stated arguments, it follows that the capacity of 
fc e ^ is > i?f (£>). 

Constant composition codes with distribution T can be used 
for reliable communication over k E A al rates < R^{D). 
T is the input distribution to the input of fc g ^ in the 
original architecture for communication of i.i.d. X source 
to within an expected distortion D over k E A. Thus, 
the channel input distribution is unchanged and thus, the 
channel resource consumption in the architecture for reliable 
communication at rates < R^{D) over fc e ^ is the same as 
the channel resource consumption in the original architecture 
for communication of i.i.d. X source to within distortion D 
over k E A. 



Appendix C 
Details of the calculations of bounds on 
probabilities of events ( [25] ) and ( [26] ) and the 
symmetry argument to prove (1341) 



Bound for probability of event ( 25 1 



nti"'-^{^rf"'(c/7>"')>i?}) 



P^lntT-'{^d-'{u't,Y-')>D^]- 



Pr 



yd" ,Y")<D'>U 



f^tT-'[^d-'iU':\Y-')>D^^ 



>(i -w„o+ 



= +Pr ( nti"'-^ Ud"'(c/f,r"') > D 



n Pr(Urf"'(^f,r"')> 



D 

(since [/'"',! < i < 2L"''"J - 1, F"' are 

independent random variables) 

1 ^ X -,2 1 



Pr ( <{ — d" ([/" ,r" ) > D 



(where J7" is uniform on 

and is independent of F" ) 



^ p^„.(y"')Pr('id"'(C/"',y"') 

2L..'RJ_i 



> n 



^ p^„.(2/"')Pr('id"'(C/"', 



> D 



E Pr.'(2/"')Pr(^rf"'(C/ 



> D 



,L"'-RJ. 



(since ?7" and Y" are independent) 



inf Pr -d"(C/",2;") 



> D 



inf Pr(^^d"(;7",2/^" 



(58) 
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The last equality above follows because 



Let Vq" be independent of U" . From ( 62 1 it follows that 



Pr( ^ ld"'(f/"',2/"') >^ 



(59) 



Pr 



;d"(C/",2/")>^ 



depends only on the type of y" ; see the symmetry argument 
later in the appendix. This gives the desired bound on proba- 
bility of event (25 i. 



Pr 



-d" ([/" )>D 



(65) 



Bound for probability of event p6| ): 



By an argument identical with the one used to prove ( |62[ ), it 
follows that 



Pr n 



i=l 
2L"'RJ 



Pr 



Pr 



n ({^d-' in-', VS)>D 



(66) 



Pr 



2L"'RJ 



for m" , m'" G Z^" . From (66 1 it follows that 



1 



(60) 



Pr 



where is uniform on V^' . 

There is choice of g e Thus, a bound for the probability From (|65]l and ([67|, ([34| follows, 
of the event is 

1 „' 1 M 

(61) 



-d"(C/",l/")>Z? (67) 



inf Pr N -d" (w" ,V;" ) > D 



This gives the desired bound on the probability of event (26i. 
The symmetry argument to prove (|34]): 
First step is to prove that 
1 



Pr 



Pr 



1 



(62) 



for sequences y"^ and j/^" with type q. Since C/" is the 
uniform distribution on , it follows that it is sufficient to 
prove that the sets 

w"' : -d"' («"',?;"') > d\ and 
n' J 

: ^d"'(u"',y;"') >i?| (63) 

have the same cardinality, y^" = vr" for some permutation 
tt" since j/^" and y^^ have the same type. Denote the sets 



-d"K,2/,")>^ 



(64) 



Set B , „' is defined analogously. 



Let u" e B „' . Since the distortion function is permu- 
tation invariant, d" (tt" u" ,7r" j/^ ) = d" (u" ). Thus, 
tt^'m"' e S ,„'. If w'"', 7^ Tr"'^'"'. It follows 

Vq 

that I > I - Interchanging y^^ and y'^ in the above 

argument, | > |&,„'|- It follows that |B 1 = |B 
(^ follows. 



